Event suppression method and system

ABSTRACT

A method and system for managing and dynamically suppressing event notification is provided. The method and system receives an indication of an event from a storage environment to be processed by a support system according to a set of default delivery parameters. Next, the method and system determines if one or more event specific delivery parameters have been associated with the event. If this is the case, then the method and system modifies the default delivery parameters for the information associated with the event according to the one or more event specific delivery parameters. Those event specific delivery parameters are also used to determine when to transmit a notification of the event to the support system. The dynamic suppression of events combines events gathered into an event log together into a set of one or more recurring events. From these events, the method and system then identifies a high frequency subset as one or more recurring events considered to occur at a higher frequency compared with a low frequency subset having one or more recurring events that occur at a lower frequency. Based on this information gathered, the method and system then eliminates a portion of the events in the high frequency subset until the frequency of events in the high frequency subset approximates the frequency of events in the low frequency subset.

INTRODUCTION

Computer and storage environments perform a variety of complexoperations that need careful monitoring. To keep track of theseoperations, applications in these systems record information in an eventlog concerning the progress and potential problems encountered.Generally, the applications running on the computer and storageenvironments detect a certain set of conditions or events and thengenerate information corresponding to the event to facilitate trackingthe condition or event at a later point in time.

Event information in some cases may indicate a normal progressiontowards the completion of certain tasks in the computer or storageenvironment. These events may be used in determining that a system isoperating normally and performing certain expected functions. Otherevent information may instead indicate that a system is slowly orabruptly failing and corrective action may be needed to avert furtherproblems. In either case, the event information helps ensure systemsoperate with a high degree of reliability, availability andserviceability.

Various computer-based support systems have been created to gather andmanage the event information in these logs. Events occurring in thestorage environment developed by Network Appliance, Inc of Sunnyvale,Calif. incorporate a more sophisticated support system referred to as anAutosupport system. Applications running in their storage environmentlog event information and also send alerts to the Autosupport system.These alerts may be stored remotely for immediate consideration bysupport personnel employed or contracted by Network Appliance. TheAutosupport system receives these alerts with the event information andperforms one or more support functions in response. In some cases, theAutosupport system may send an automated message to the customerindicating a number of options ranging from an imminent system failureto perhaps an incorrect configuration condition. It is possible acorrective solution is also suggested along with the message to thecustomer. In some cases, support personnel may phone, travel to acustomer site to repair a system or interactively contact the customerto assist with analyzing the event information and proposing solutions.

Unfortunately, an excessive number of events and event information maybe generated as the number of applications running on computer andstorage environments increase. Event logs storing the event informationmay rapidly fill and quickly need archiving. Conventional approaches toarchive data include rotating logs, tailing the last portion of the logsor overwriting the logs are generally not acceptable. For example, aconventional support system may use “tailing” to reduce the size of alog having thousands of entries to only 200 entries by deleting all butthe last 200 entries in the “tail” of the log.

In general, support systems using a conventional approach to managingthese event logs may also eliminate critical information or makeinformation difficult to obtain. For example, tailing may reduce thesize of an event log to only the last several hundred entries but italso eliminates the preceding entries and information. This makestrouble shooting on computer and storage environments difficult as theentries and values in the event log are limited.

Managing event logs is also difficult problem to solve in advance sincethe frequency and volume of event information may change depending onthe particular computer or storage environment installation. Forexample, the frequency of events may depend on dynamically changing dataconditions on a system that may vary depending on the time of operation.Overall, it is difficult to determine the importance of entries withinevent logs in advance.

For these and other reasons, it is therefore desirable to create animproved system and method of managing event information entered intoevent logs and related transmission of events to support systemsmonitoring these events.

BRIEF DESCRIPTION OF THE DRAWINGS

The features of the present invention and the manner of attaining them,and the invention itself, will be best understood by reference to thefollowing detailed description of embodiments of the invention, taken inconjunction with the accompanying drawings, wherein:

FIG. 1 is a schematic block diagram of an exemplary system providingstorage and support in accordance with aspects of the present invention;

FIG. 2 is a schematic block diagram of storage system that may beadvantageously used with one implementation of the present invention;

FIG. 3 an exemplary storage operating system is illustrated thatimplements one or more aspects of the present invention;

FIG. 4 is a flowchart diagram providing the operations for processingthe generation of event information in accordance with oneimplementation of the present invention;

FIG. 5 contains an excerpt of a configuration file having configurableparameter entries for a named event in accordance with oneimplementation of the present invention;

FIG. 6 is a flowchart diagram and of the operations for dynamic eventsuppression in accordance with one implementation of the presentinvention;

FIG. 7 is another flow chart diagram of the operations used by oneimplementation of the present invention to identify a high frequencysubset of recurring events suitable for dynamic event suppression; and

FIGS. 8A and 8B graphically illustrate the dynamic suppression andelimination of event information in accordance with one implementationof the present invention.

SUMMARY OF THE INVENTION

Aspects of the present invention provide a method and system of managingnotification of events associated with a storage environment. Themanagement method and system includes receiving an indication of anevent from the storage environment to be processed by a support systemaccording to a set of default delivery parameters. The default deliveryparameters generally indicate that all events or no events should beprocessed. Next, the method and system determines if one or more eventspecific delivery parameters have been associated with the event. Forexample, a named event may have particular delivery parameters specifiedin a file or a registry. If this is the case, then the method and systemmodifies the default delivery parameters for the information associatedwith the event according to the one or more event specific deliveryparameters. These event specific delivery parameters may also used todetermine when to transmit a notification of the event to the supportsystem.

Another aspect of the present invention provides a method and system fordynamically suppressing events associated with a storage environment.The method and system combine events gathered into an event log togetherinto a set of recurring events. From these recurring events, the methodand system then identifies a high frequency subset as includingrecurring events considered to occur at a higher frequency compared witha low frequency subset having one or more recurring events that occur ata lower frequency. Based on this information gathered, the method andsystem then eliminates a portion of the events in the high frequencysubset until the frequency of events in the high frequency subsetapproximates the frequency of events in the low frequency subset.

DETAILED DESCRIPTION

Aspects of the present invention provide an improved approach formanaging the growth of log files as used in computers and storageenvironments. Benefits provided by aspects of the present inventioninclude, but are not limited to, one or more of the following mentionedherein below. In storage environments, log files may receive manyhundreds of entries in a short period of time as a result of eventsoccurring on both software and/or hardware components of the storageenvironment These events not only have the potential for filling logslocally but also may overwhelm other support systems and peopleprocessing/analyzing these events. Instead of truncating or rotatinglogs or other similar approach, aspects of the present inventiondynamically suppresses the number of entries being generated thusreducing the aggregate number of actual events. This has the benefit ofreducing the overall number of events generated yet allows importantevents to be stored for later analysis and consideration.

It is also contemplated and recognized that certain processes or threadsrunning in association with a storage environment may generate adisproportionate number of events when compared with other threads ofexecution. This makes it difficult to provide a static cap or limit tothe number of events any particular application may generate. Instead,aspects of the present invention consider the overall number of eventsbeing generated in real-time and limit those events generated at a muchhigher frequency. Actual events and conditions occurring in a storageenvironment may change over time yet will be dynamically moderated inaccordance with aspects of the present invention.

Further, aspects of the present invention allow certain specificdelivery parameters for events to be adjusted as needed on the storageenvironment. Each event generated by the storage environment is namedand delivered according to a set of configuration parameters setup inadvance. Events generated by the storage environment can be tailored toaccommodate the needs and conditions associated with the storageenvironment. The size of the event logs can be directly reduced byturning off or limiting the generation of certain events. The system forlogging these events does not need to be completely turned off to limitevent generation since each event can be individually configured.

FIG. 1 is a schematic block diagram of an exemplary system 100 providingstorage and support systems in accordance with aspects of the presentinvention. System 100 in FIG. 1 includes clients 102/104, storageenvironment 106, an Autosupport system 116 and an Autosupport team 118that may intervene or respond as notices of events from storageenvironment 106 may be transmitted to the Autosupport system 116. It canbe appreciated that Autosupport system 116 is one support systemdesigned and used by Network Appliance, Inc. of Sunnyvale, Calif. usingimplementations of the present invention. Alternate implementations ofthe present invention can be applied to other support systems thatmanage and process local logs, remote logs and any other types of logsregardless of the number of entries being made in these logs and thetype of computer, storage or other system placing event entries in thelogs.

Clients 102/104 may be computers or other computer-like devices capableof accessing storage environment 106 either directly or indirectly overa network 114. In general, clients 102/104 may access storageenvironment 106 over network 114 using wireless or wired connectionssupporting one or more point-to-point links, shared local area networks(LAN), wide area networks (WAN) or other access technologies. Theseclients 102/104 may be accessing data, applications, raw storage orvarious combinations thereof stored on storage environment 106.

Storage environment 106 includes one or more storage system representedas storage system 108 through storage system 110 and their correspondingstorage devices 112 through storage devices 114. For example, storagesystem 108 (also referred to as filer 108) is a computer system thatprovides file and block access to the organization of information onstorage devices 112, such as disks. Storage system 108 may include astorage operating system that implements a file system to logicallyorganize the information as a hierarchical structure of directories andfiles on the disks. Each “on-disk” file may be implemented as a set ofdisk blocks configured to store information.

As used herein, the term storage operating system generally refers tothe computer-executable code operable on a storage environment thatmanages data access and client access requests and may implement filesystem semantics in implementations involving filers. In this sense, theData ONTAP® storage operating system, available from Network Appliance,Inc. of Sunnyvale, Calif., which implements a Write Anywhere FileLayout® (WAFL®) file system, is an example of such a storage operatingsystem implemented as a microkernel within an overall protocol stack andassociated disk storage. The storage operating system can also beimplemented as an application program operating over a general-purposeoperating system, such as UNIX® or Windows NT®, or as a general-purposeoperating system with configurable functionality, which is configuredfor storage applications as described herein.

In one implementation, storage devices 112 and 114 may be implementedusing physical storage disks having one or more storage volumes todefine an overall logical arrangement of storage space. Some filerimplementations can serve a large number of storage volumes that mayexceed 150 discrete units, for example. A storage volume is “loaded” instorage system 108 or 110 by copying the logical organization of thevolume's files, data and directories into memory of storage system 108or 110. Once a volume has been loaded in memory of a storage system, thevolume may be “mounted” by one or more users, applications, or devicesas long as they are permitted to access its contents and navigate itsnamespace. As used herein, a volume is said to be “in use” when it isloaded in a filer's memory and at least one user, application, etc. hasmounted the volume and modified its contents.

Each file and directory stored in a filer is typically identified by afile-handle identifier or “file handle.” A file handle generallyincludes at least a volume identifier (V), a file identifier (F) and ageneration number (G) that collectively describe a specific file ordirectory in the filer. The volume identifier indicates which storagevolume in the filer contains the file or directory. The file identifieridentifies the specific file or directory in the volume. For example, ifthe volume implements an inode-based file system, such as the WAFL® filesystem, the file identifier may correspond to an inode number of a fileor directory within the volume. The generation number identifies aparticular instance of the file or directory in (he volume. Forinstance, if different versions of the same file are stored in thevolume, each may be differentiated from the others by its correspondinggeneration number. In general, the largest generation number for a fileor directory corresponds to its most recent version. It is contemplatedthat file handles may also include other information besides a volumeidentifier, file identifier and generation number. Accordingly, it isfurther contemplated that a variety of different file-handleimplementations are envisioned to be within the scope of the presentinvention.

As illustrated in FIG. 1, storage systems like storage system 108 may beconfigured to operate according to a client/server model of informationdelivery thereby allowing multiple clients, such as client 102 andclient 104, to access files simultaneously. In this model, client 102may be a computer running an application, such as a file-systemprotocol, that connects to storage system 108 over a network 114 withone or more of the aforementioned network protocols, such aspoint-to-point links, shared LAN, WAN, or VPN as implemented over apublic network such as the Internet. Communications between storagesystem 108 and client 102 is typically embodied as packets sent over thecomputer network. Each client may request the services of storage system108 by issuing file-system protocol messages formatted in accordancewith a conventional file-system protocol, such as the Common InternetFile System (CIFS) or Network File System (NFS) protocol.

For example, client 102 and client 104 are configured to communicatewith a file-access protocol engine of storage system 108 using astateful or stateless file-system protocol. A stateful protocol, such asCIFS protocol, is a connection-oriented protocol that requires storagesystem 108, e.g., the file-access protocol engine, and client 102 andclient 104 to establish a communication session (or “virtual circuit”)through which they exchange information. Each communication session isthen associated with session-specific “state” information, which mayinclude, inter alia, authentication information, session identifiers,file-handle identifiers, and other related information. In the event thesession is lost or interrupted, the stale information for thecommunication session may be used to reestablish the session withouthaving to re-authenticate client 102, client 104 or other clients aswell as renegotiate many of the session parameters. Upon re-establishingthe stateful-protocol session, storage system 108 typically invalidatesthe client's outstanding file handles and issues anew set of filehandles to the client. Thus, any client requests that were lost as aresult of the session failure can be “replayed” by client 102, client104 using the new set of file handles.

In contrast, a stateless protocol, such as the NFS protocol, does notrequire establishment of a formal communication session. Instead,requests from client 102, client 104 or other clients in a statelessprotocol are authenticated and processed by the storage system 108 on aper-request basis rather than a per-session basis. That is, the validityof a client request in a stateless protocol is not bound to the durationof any specific communication session. Thus, unlike file handles used instateful protocols, file handles in stateless protocols may remain valideven after the storage system has been temporarily shutdown or disabled.

In operation, an event 128 may occur at some point in time during theoperation of one or more storage systems 108 through 110. Event 128 mayoccur as the result of routine system status checks or more serious andimminent failures requiring more immediate attention. Variousapplications on storage environment 106 monitor a range of conditionsand generate event information corresponding to the particular event128. Accordingly, an increasingly large number of events 128 may resultin large amounts of event information to be entered into an event logkept on each respective storage system. If the conditions persist orrepeatedly occur, some applications may attempt to generate and storeevent information in these logs so rapidly that the event log may growto an unmanageable size.

Likewise, applications on storage environment 106 may also transmitlarge amounts of event information to the Autosupport system 116. It iscontemplated that Autosupport system 116 has been designed andconfigured to not only support storage environment 106 but many otherstorage environments in other locations (not shown). Autosupport system116 helps avert large scale system failures or problems thus increasingoverall storage environment 106 availability and minimizing or reducingdowntime. In response to a particular event 128, applications or threadsrunning on storage environment 106 send corresponding event informationfor processing by one or more of Autosupport servers 120 through 124.These Autosupport servers 120 through 124 may reference other archivedevents held on storage devices 122 through 126 respectively as well asoptionally receive guidance from one or more members of the Autosupportteam 118 to determine a resolution or plan of action.

It is important that the applications and threads on storage environment106 do not repeatedly transmit redundant event information toAutosupport system 116. For example, this may overwhelm Autosupportservers 120 through 124 and reduce their ability to adequately notifyoperation clients 102/104 or personnel managing storage environment 106of corrective actions. Aspects of the present invention address this andother concerns by suppressing certain event information before it isgenerated and/or transmitted to Autosupport system 116 for storage inthe event log.

FIG. 2 is a schematic block diagram of storage system 108 that may beadvantageously used with one implementation of the present invention.Storage system 108 includes a memory 202, a multi-port storage adapter204, a processor 206, a network adapter 208, an NVRAM 210 and I/O ports212 capable of communicating over interconnect 214. It is contemplatedthat aspects of the invention described herein may apply to any type ofspecial-purpose computer (e.g., file serving appliance) orgeneral-purpose computer, including a standalone computer, embodied as astorage environment. To that end, storage system 108 may be broadly, andalternatively, referred to as a component of the storage environment106. Moreover, various aspects of the invention can be adapted to avariety of storage environment architectures including, but not limitedto, a network-attached storage (NAS) environment, a storage area network(SAN) and disk assembly directly-attached to a client/host computer. Theterm “storage environment” should, therefore, be taken broadly toinclude such arrangements and combinations thereof.

The network adapter 208 comprises the mechanical, electrical andsignaling circuitry needed to connect the storage system 108 to client102/104 over network 114, which may include a point-to-point connectionor a shared medium, such as a LAN. Clients 102/104 may be ageneral-purpose computer configured to execute applications, such as afile-system protocol. Moreover, clients 102/104 may interact with thestorage system 108 in accordance with a client/server model ofinformation delivery. That is, clients 102/104 may forward requests forthe services of storage system 108, and storage system 108 may returnthe results of the services requested by the client, by exchangingpackets encapsulated by a protocol format over the network 114 (e.g.,the Common Internet File System (CIFS) protocol or Network File System(NFS)).

The NVRAM 210 provides fault-tolerant backup of data, enabling theintegrity of storage system transactions to survive a serviceinterruption based upon a power failure, or other fault. The size of theN VRAM is variable, although it is typically sized sufficiently to log acertain time-based chunk of transactions (for example, several secondsworth). The NVRAM may store client requests corresponding to discreteclient messages requesting file transactions-such as “WRITE,” “CREATE,”“OPEN,” and the like. Further, these entries may be logged in the NVRAM,typically according to the particular order they are completed. The useof the NVRAM for system backup and crash recovery operations isgenerally described in commonly assigned application Ser. No.09/898,894, entitled “System and Method for Parallelized Replay of anNVRAM Log in a Storage Appliance” by Steven S. Watanabe et al. assignedto the assignee of the present invention and expressly incorporatedherein by reference.

In the illustrative implementation in FIG. 2, memory 202 includesstorage locations that are addressable by the processor and adapters forstoring software program code and data. For example, memory 202 mayinclude a form of random access memory (RAM) that is generally clearedby a power cycle or other reboot operation and classified as “volatile”memory. Processor 206 and various adapters may, in turn, compriseprocessing elements and/or logic circuitry configured to execute thesoftware code and manipulate the data stored in the memory 202. Thestorage operating systems 216, portions of which are typically residentin memory and executed by the processing elements, functionallyorganizes storage system 108 by, inter alia, invoking storage operationsin support of a storage service implemented by storage system 108. Whilestorage operating system 216 may operate alone, it is also contemplatedthat storage operating system 216 may execute within a run-timeenvironment 218 that may include a general purpose operating system orvisualization environments that help improve utilization and efficientallocation of hardware and computing resources on storage system 108. Itwill be apparent to those skilled in the art that other processing andmemory means, including various computer readable media, may be used forstoring and executing program instructions pertaining to the inventivetechniques described herein.

Multi-port storage adapter 204 cooperates with the storage operatingsystem 216 and optionally run-time environment 21$ executing on storagesystem 108 to access information requested by the one or more clients.Resulting information may be stored on the storage devices 112 that areattached, via the multi-port storage adapter 204, to the storage system108 or other nodes of a storage environment as defined herein. Themulti-port storage adapter 204 includes input/output (I/O) interfacecircuitry that couples to the storage devices 112 over an I/Ointerconnect arrangement, such as a conventional high-performance, FibreChannel serial link topology. One or more interconnects on themulti-port storage adapter 204 may be used to provide higher throughputand/or reliability. The information is retrieved by the multi-portstorage adapter 204 and, if necessary, processed by the processor 206(or the multi-port storage adapter 204 itself) prior to being forwardedover interconnect 214 to the network adapter 208, whore the informationis formatted into one or more packets and returned to a requestingclient.

In one implementation, the storage devices 112 are arranged into aplurality of volumes, each having a file system associated therewith.The volumes each include one or more disks. Implementations of thepresent invention configure the physical disks of storage devices 112into RAID groups so that some disks store striped data and at least onedisk stores separate parity for the data, in accordance with a preferredRAID 4 configuration. However, other configurations (e.g. RAID 5 havingdistributed parity across stripes, RAID 0 mirroring and others) are alsocontemplated. In a typical implementation, a volume is implemented as amultiplicity of RAID groups.

Referring to FIG. 3, an exemplary storage operating system 216 isillustrated that implements one or more aspects of the presentinvention. As previously described, the term “storage operating system”as used herein with respect to a storage system generally refers to thecomputer-executable code operable on a storage environment thatimplements file system semantics (such as the above-referenced WAFL®)and manages data access. In this sense, Data ON TAP® software is anexample of such a storage operating system implemented as a microkernel.The storage operating system can also be implemented as an applicationprogram operating over a general-purpose operating system, such as UNIX®or Windows NT®, or as a general-purpose operating system withconfigurable functionality, which is configured for storage applicationsas described herein.

It should be understood that the organization of the storage operatingsystem illustrated in FIG. 3 represents only one possibleimplementation. Accordingly, it is contemplated that various aspects ofthis invention can be implemented using a variety of alternate storageoperating system architectures. As shown in FIG. 3, the storageoperating system 216 includes a series of software layers organized toform an integrated network protocol stack providing data paths forclients to access information stored on the storage system usingfile-access protocols.

The protocol stack includes a media access layer 302 of network drivers(e.g., an Ethernet driver) that interfaces to network communication andprotocol layers, such as the Internet Protocol (IP) layer 304 and thetransport layer 306 (e.g., TCP/UDP protocol). A file-access protocollayer provides multi-protocol data access and, to that end, includessupport for the Hypertext Transfer Protocol (HTTP) protocol 312, the NFSprotocol 308 and the CIFS protocol 310. In addition, the storageoperating system 216 may include support for other protocols, including,but not limited to, the direct access file system (DAFS) protocol, theweb-based distributed authoring and versioning (WebDAV) protocol, theInternet small computer system interface (iSCSI) protocol, and otherfunctionally appropriate protocols. The storage operating system 216also includes a disk storage layer 320 that implements a disk storageprotocol, such as a RAID protocol and a disk driver layer 318 thatimplements a disk control protocol, such as the small computer systeminterface (SCSI).

Bridging the disk software layers with the network and file-systemprotocol layers is a file system layer 314 of the storage operatingsystem 216. In one implementation, the file system layer 314 implementsa file system having an on-disk format representation that isblock-based using, e.g., 4-kilobyte (KB) data blocks and using inodes todescribe the files. An inode is a data structure used to storeinformation about a file, such as ownership of the file, accesspermission for the file, size of the file, name of the file, location ofthe file, etc. In response to receiving a client's file access request,file system layer 314 generates operations to load (retrieve) therequested data from storage devices if it is not resident in the storagesystem's “in-core” memory. An external file handle in the client requesttypically identifies a file or directory requested by the requestingclient Specifically, the file handle may specify a generation number,inode number and volume number corresponding to the client's requesteddata.

If the information is not resident in the filer's “in-core” memory, thefile system layer 314 indexes into an inode file using the receivedinode number to access an appropriate entry and retrieve a logicalvolume block number. The file system layer 314 then passes the logicalvolume block number to the disk storage layer 320 (RAID), which mapsthat logical number to a disk block number and sends the latter to anappropriate driver (for example, an encapsulation of SCSI implemented ona fibre channel disk interconnection) of the disk driver layer 318. Thedisk driver accesses the disk block number from the storage devices andloads the requested data in memory for processing by the storage system.Upon completion of the request, the storage operating system 216 on thestorage system returns a response (e.g., a conventional acknowledgementpacket defined by the CLFS specification) to the client over thenetwork.

It should be noted that the software “path” 316 through the storageoperating system layers described above needed to perform data storageaccess for the client request received at the filer may alternatively beimplemented in hardware or a combination of hardware and software. Thatis, in an alternate embodiment of the invention, the storage accessrequest path 316 maybe implemented as logic circuitry embodied within afield programmable gate array (FPGA) or an application specificintegrated circuit (ASIC). This type of hardware implementationincreases the performance of the file service provided by storageoperating system 216 in response to a file system request packet issuedby a client. Moreover, in another alternate embodiment of the invention,the processing elements of network adapter 208 and multi-port storageinterface 204 in FIG. 2 may be configured to offload some or all of thepacket processing and storage access operations, respectively, fromprocessor 206 to thereby increase the performance of (he file serviceprovided by the storage system.

In accordance with aspects of the present invention, storage operationsystem 216 further implements a dynamic event suppression module 328. Asdescribed in further detail later herein, the dynamic event suppressionmodule 328 is capable of suppressing the generation of event informationdeemed to be repetitive or redundant. For example, multiple entries in alog will occur when an event that occurs in the storage environmentcontinues to occur over time without resolution. These multiple entriesin the log may be identified by the dynamic event suppression module 328as redundant or repetitive based upon a common identifier, error code orother common marker common to each log entry. The dynamic eventsuppression module 328 isolates a high frequency subset of theserecurring events based on the log entries and reduces the amount ofinformation generated for the event logs and Autosupport system.

Further, an Autosupport event configuration module 326 designed inaccordance with aspects of the present invention also operates tofurther reduce the amount of event information stored in the event logsand transmitted to the Autosupport system. This aspect of the presentinvention uses predetermined configuration information stored in eventconfiguration file 322 and Autosupport registry 324 to directly limit orsuppress specific events named in advance. For example, theconfiguration file 322 or Autosupport registry 324 may be configured toturn-off the generation of certain event information or greatly limitthe event information from being generated upon every occurrence of aparticular event in the storage system.

Referring to FIG. 4, a flowchart diagram provides the operations forconfiguring the generation of event information in accordance with oneimplementation of the present invention. Initially, an Autosupportclient on the storage system receives an indication of an event from asubsystem on the storage system. Typically, the event indicationincludes a request to log the event information surrounding theoccurrence of the event in an event log as well as transmit the eventinformation to an Autosupport system (402). In many cases, theAutosupport system is remotely located over a network or the Internetand not on the storage system. However, the event log used to store theaggregate of the event information is generally stored locally on thestorage system or in close proximity.

Once the initial indication is received, the storage system determinesif the Autosupport client on the storage system has been configured toprocess individual events in addition to the default settings (404).This determination generally checks to see if either an eventconfiguration file or Autosupport registry has been created andpopulated with configuration data. Depending on the implementation, theevent configuration file (e.g., a text-based flat file) or anAutosupport registry (e.g., a compiled and indexed database) names oneor more different events and lists various parameters for modifying thedefault processing of certain named events. In the absence of locatingeither the event configuration file or the Autosupport registry, aspectsof the present invention will process events according to an overalldefault setting or strategy. For example, turning on Autosupport bydefault will cause all event information to be transmitted to anAutosupport system (406) and then entered in event logs (414) in theabsence of an event configuration file or registry. The Autosupport bydefault can also be turned off but would necessarily neither store theevent information in an event log nor transmit any event information tothe Autosupport system (not illustrated).

Alternatively, the Autosupport client on the storage system nextdetermines if the particular named event (402) has specific Autosupportnotification parameters specified in either the event configuration fileor Autosupport registry (408). Accordingly, if a particular named eventcannot be located in a configuration file or Autosupport registry thenthe event is processed according to a default setting for the overallstorage system (406) as previously described. However, if the event isnamed and located in the event configuration file or Autosupportregistry then delivery of the event information is modified according tothe parameter settings associated with the event (410). These parametersettings can explicitly indicate that no event information should berecorded or that a limited amount of event information should beprocessed and under certain conditions.

Essentially, the Autosupport client associated with the storage systemmay then transmit notification of the event information to theAutosupport system using the modified delivery parameters as configured(412). For example, the delivery parameters may limit how many timesevent information from the same named event can be transmitted in aparticular time interval. Similarly, Autosupport clients will alsoupdate event logs with event information according to the parametersspecified for the particular event (414).

FIG. 5 contains an excerpt of a configuration file having configurableparameter entries for a named event in accordance with oneimplementation of the present invention. In this example, the firstfield asuptrigger indicates the Autosupport event being configured. Inthis case asup.msg.cli.doit corresponds to events generated when theuser manually issues autosupport.doit command with a command lineinterface (CLI). If the entry is shortened to asuptrigger=asup.msg thenall Autosupport events with the asup.msg event prefix will be modifiedby the particular configuration parameters that follow.

In the next field, the configuration entryautosupport.support.to.content indicates the data content type that canbe sent to the autosupport.support.to recipients. In this example, thedata content types are: “none”, which is interpreted to mean sendnothing; “complete”, which is interpreted to mean send detailed eventinformation; “pager”, which is interpreted to mean send a short textnote. Consequently, the next field and entryautosupport.support.to.content=complete is interpreted to mean that adetailed autosupport message will be sent to the autosupportsupport.torecipient.

The next field autosupport.support.to.timer indicates how often thisAutosupport message should be sent to recipients named in the entryautosupport.support.to. For example, a value of 0 indicates that theAutosupport message should always be sent to autosupport.support.torecipients. Anon-zero positive value instead indicates the time intervalin seconds before posting the next similar Autosupport message toautosupport.support.to recipients. The difference between the currenttime and last time the same Autosupport event was generated is compared,and if the difference is greater than or equal to the timer interval forthe event in autosupport.support.to.timer, then Autosupport event isposted.

Lastly the entries autosupport.to.content=complete andautosupport.to.timer=300 indicate that autosupport.content.to recipientswill receive detailed Autosupport messages at a time interval of 300seconds. Likewise, entries autosupport.noteto.content=pager andautosupport.noteto.timer=300 indicates that autosupport.notetorecipients should receive a shorter Autosupport note also at a timeinterval of 300 seconds.

Implementations of the present invention may additionally use dynamicevent suppression to limit the amount of event information. It iscontemplated that dynamic event suppression can be used in combinationwith the Autosupport client and parameters in the event configurationfile as described above with respect to FIG. 3. For example, managingnotification or transmission of events may take place according tospecific delivery parameters setup for each particular event and adetermination that the events are in either a high frequency subset ofrecurring events or a low frequency subset of recurring events.

Accordingly, FIG. 6 is a block diagram and flowchart of the operationsfor dynamic event suppression in accordance with one implementation ofthe present invention. As a preliminary step, the dynamic eventsuppression operation receives a target event log size value to limitthe size of an event log (502). For example, the dynamic eventsuppression may be configured with a default predetermined target eventlog size and later query a system administrator to modify the targetevent log size either interactively when setting up the dynamic eventsuppression processing or in a preferences setting area for the dynamicevent suppression operation. The maximum value specified may varydepending on the installation details of the storage system and willoperate as an upper limit of the size any event log file can become.

Next, the storage system gathers event information in response to theevents. At first, the event information for all events is stored in anevent log associated with the storage system (504). Little eventinformation is discarded initially as the preliminary event informationgathered determines which group of events should be dynamicallysuppressed.

Eventually, the dynamic event suppression operation determines if theevent log size has reached a predetermined fraction of the target eventlog size value previously indicated (506). In one implementation, it maybe sufficient to allow the events and event information stored in theevent log to reach approximately 50% of the initial target event logsize. Setting the predetermined fraction level low enough ensures thatthe event log does not rapidly fill-up to the maximum or target log sizeright away with potentially unwanted event information. For example, thedynamic event suppression operation will begin the process ofeliminating or suppressing certain entries in a log file well before thenumber of log file entries become so large that the resulting log fileis overly large and has too many entries to manipulate or even storefurther entries. However, the predetermined fraction level must also beset high enough to provide a statistically sufficient number of datapoints before applying the dynamic event suppression operation asdescribed in further detail below. For example, setting the fractionlevel to over 50% means that dynamic event suppression operation willnot be invoked until the number of entries in the event log reaches atleast 50% of the target event log size allowable for the event log file.Accordingly, the final value for the predetermined fraction level may beset higher or lower depending on the particular storage systeminstallation.

Once sufficient event information has been gathered, one implementationof the present invention ranks recurring events in ascending orderaccording to a frequency of their occurrence (508). Recurring eventscaptured in the event log may be the result of one or severalapplications or threads repeatedly detecting certain events andgenerating the same or similar event information. For example, anapplication may detect that a particular LUN has gone offline andgenerate corresponding event information every 15 seconds. These eventswould be grouped together as a single recurring event and rankedrelative to other recurring events based on their frequency at the timeof the ranking.

Next, the dynamic event suppression operation determines if there is asubset of the recurring events occurring at a higher frequency thanother recurring events in the ranking (510). It is contemplated thatthere may be different approaches to separating high frequency recurringevents from low frequency recurring events. In one implementation, highfrequency recurring events can be measured relative to the frequency ofthe other recurring events and is described in further detail laterherein with respect to FIG. 7 and FIG. 8. Alternatively, the highfrequency recurring events may exceed an absolute predeterminedthreshold value. Low frequency recurring events would be classifiedaccordingly if they are below the absolute threshold value. The absolutepredetermined threshold value can be determined dynamically and maydepend on the event frequency measurements associated with the highestand lowest recurring events. For example, the absolute predeterminedthreshold value may be selected as the mean or median frequency valuefrom all the recurring events. It is also possible that aspects of theinvention cannot adequately classify the recurring events as occurringat a higher or lower frequency. For example, the recurring events mayoccur with approximately the same frequency. In this latter, case it ispossible that an immediate determination of the event log size is made(514) and event elimination operations are not performed (516).

Once identified, aspects of the present invention eliminate a portion ofthe events and/or event information associated with the high frequencysubset of events. Events from the high frequency subset of events aresuppressed or eliminated until the frequency of events in the highfrequency subset closely approximates the frequency of events in the lowfrequency subset of recurring events (512). In one implementation, arandom number generator (RNG) function is used to select individualevents from the high frequency subset of recurring events. This approachtends to optimally reduce the most redundant event information while notentirely eliminating the event information from consideration in thesubsequent analysis. For example, if three different events areoccurring at a high frequency in the system then aspects of the presentinvention will attempt to eliminate several, but not all, entries in thelog associated with each of the three events. Use of the RNG helps toensure that redundant entries in the log associated with each of thethree events are reduced without being completely eliminated.

Alternative implementations may eliminate a portion of events in thehigher frequency subset until the frequency of events in the higherfrequency subset approximates a predetermined proportion of thefrequency of events in the lower frequency subset. For example, aspectsof the present invention may reduce the higher frequency subset from20,000 entries in the event log to only 10,000 entries based upon a 2×multiple of the 5,000 event entries from the lower frequency subset ofrecurring events.

The dynamic suppression operation effectively reduces the difference infrequency between the low frequency subset and high frequency subset ofrecurring events. If the event log has been sufficient reduced and/orhas not reached the target log size value (514) then additional eventinformation may be added to the log as events occur in the storageenvironment or storage system (504). Despite attempts to suppress therecurring events, if the size of the event log does eventually reach thetarget log size value then aspects of the present invention will archivethe event log and create a new event log to store subsequent eventsassociated with the storage environment or storage system (516). In thislatter case, conventional methods of archiving or eliminating an eventlog may be implemented and include log tailing, rotating log files oraging the logs with a first-in-first-out (FIFO) type queue.

FIG. 7 is another flow chart diagram of the operations used by oneimplementation of the present invention to identify a high frequencysubset of recurring events suitable for dynamic event suppression. Inthis example, the dynamic suppression operation first receives athreshold differential value to distinguish between the low frequencysubset of recurring events and the high frequency subset of recurringevents (602). Essentially, the threshold differential value can bespecified on a percentage basis as a minimum difference between the twosets of recurring events. For example, this minimum thresholddifferential value may be set to 30% or any other value as deemedsuitable under the circumstances. For example, the lower frequencysubset of recurring events may only occur 300 times during a timeinterval and the higher frequency occurring events may occur at afrequency of approximately 30% higher rate or 390 occurrences or moreduring a time interval. Next, one implementation of the presentinvention performs pairwise comparison of the recurring events ranked inascending order of their frequency of recurrence (604). Ranking therecurring events in sequence ensures the largest gap between lower andhigher recurring events will be readily identified.

The dynamic event suppression operation determines if the comparisonreveals a sufficient difference in the frequency of the recurringevents. If the difference does not exceed the threshold differentialvalue or percentage then the next pair of recurring events in theranking are compared (606—No). This continues until the pairwisecomparison indicates a difference that exceeds the thresholddifferential value (606—Yes). One implementation of the presentinvention then classifies recurring events at or below the lowerrecurring frequency of events as the low frequency subset of recurringevents. For example, events occurring less than or equal 23 in FIG. 8Aduring a time interval may be classified as occurring at a lowerfrequency while those occurring at greater than 23 may be classified ashigher frequency. The recurring events above the lower recurringfrequency of events are all classified as included in the higherfrequency subset of recurring events. While it is not illustratedexplicitly, it is also possible that no comparison exceeds the thresholddifferential value and the recurring events cannot be classified aseither higher or lower recurring events.

FIGS. 8A and 8B graphically illustrates the dynamic suppression andelimination of event information in accordance with one implementationof the present invention. In FIG. 8A a bar chart plots event frequencyalong a vertical axis and enumerates the recurring named events alongthe horizontal axis. It can be sec that the recurring events are orderedin increasing frequency as follows: 11, 14, 18, 21,23, 58, 59, 61, 62,and 64. In this example, the symbol δ represents the thresholddifferential value. In one implementation, a graphical user interface(GUI) tool displays these graphs illustrated in FIG. 8A and FIG. 8B aspart of a suite of tools for managing the event logs associated with astorage environment, computer or other computer-like systems. The GUItool for managing the event logs can graphically illustrate how much ofthe information in the logs have been eliminated or suppressed inaccordance with aspects of the present invention.

As illustrated in FIG. 8A, the differential in the recurring frequencyis largest between 23 and 58 and represented by Δ. Assuming Δ is atleast greater than δ, aspects of the present invention would thenclassify recurring events with frequencies 11, 14, 18, 21, and 23 as lowfrequency events and recurring events with frequencies 58, 59, 61, 62and 64 as high frequency events. The average frequency of the lowfrequency events is 17.4 (freq_(ave-low)=17.4) and the average frequencyof the high frequency events is 60.8 (freq_(ave-high)=60.8)

In accordance with one implementation of the present invention, eventsare randomly eliminated in the subset of events associated with the highfrequency events until the freq_(ave-low)≅freq_(ave-high). In thisexample, the random elimination of events in the high frequency eventsresults in a new set of frequencies of: 16, 18, 16, 20, and 17 asillustrated in FIG. 8B. By eliminating these events, the averagefrequency in the high frequency events closely approximates the averagefrequency in the low frequency events and the event information iseffectively and efficiently suppressed.

In general, implementations of the invention can be implemented indigital electronic circuitry, or in computer hardware, firmware,software, or in combinations of them. Apparatus of the invention can beimplemented in a computer program product tangibly embodied in a machinereadable storage device for execution by a programmable processor, andmethod steps of the invention can be performed by a programmableprocessor executing a program of instructions to perform functions ofthe invention by operating on input data and generating output. Theinvention can be implemented advantageously in one or more computerprograms that are executable on a programmable system including at leastone programmable processor coupled to receive data and instructionsfrom, and to transmit data and instructions to, a data storageenvironment, at least one input device, and at least one output device.Each computer program cam be implemented in a high level procedural orobject oriented programming language, or in assembly or machine languageif desired; and in any case, the language can be a compiled orinterpreted language. Suitable processors include, by way of example,both general and special purpose microprocessors. Generally, theprocessor receives instructions and data from a read only memory and/ora random access memory. Also, a computer will include one or moresecondary storage or mass storage devices for storing data files; suchdevices include magnetic disks, such as internal hard disks andremovable disks, magneto optical disks; and optical disks. Storagedevices suitable for tangibly embodying computer program instructionsand data include all forms of non-volatile memory, including by way ofexample semiconductor memory devices, such as EPROM, EEPROM, and flashmemory devices; magnetic disks such as internal hard disks and removabledisks; magneto optical disks; and CD ROM disks. Any of the foregoing canbe supplemented by, or Incorporated in, ASICs (application specificintegrated circuits).

While specific embodiments have been described herein for purposes ofillustration, various modifications may be made without departing fromthe spirit and scope of the invention. Accordingly, the inversion is notlimited to the above-described implementations, but instead is definedby the appended claims in light of their full scope of equivalents. Forexample, implementations of the present invention suggest using anaverage frequency of events to consider past of criteria for eliminatingevents. However, there are many other measurements of frequency andapproaches to eliminating the recurring events from the group ofrecurring events classified as recurring with a higher frequency. Also,it is contemplated that there are other methods of dividing orclassifying the recurring events into low frequency and high frequencyrecurring events other than those proposed and described above.Furthermore, aspects of the present invention are described inconjunction with an Autosupport system however it is contemplated thatvarious implementations can be used with many different types of supportsystems and to manage logs of event information stored on local storagesystems, remote storage systems, computers and many other devices thatmay create event logs to track events.

This description of the invention should be understood to include allnovel and non-obvious combinations of elements described herein, andclaims may be presented in this or a later application to any novel andnon-obvious combination of these elements. The foregoing embodiments areillustrative, and no single feature or element is essential to allpossible combinations that may be claimed in this or a laterapplication. Unless otherwise specified, steps of a method claim neednot be performed in the order specified. The invention is not limited tothe above-described implementations, but instead is defined by theappended claims in light of their full scope of equivalents. Where theclaims recite “a” or “a first” element of the equivalent thereof, suchclaims should be understood to include incorporation of one or more suchelements, neither requiring nor excluding two or more such elements.

1.-18. (canceled)
 19. An apparatus, comprising: a memory; and logic, atleast a portion of the logic in circuitry coupled to the memory, thelogic to: receive a first indication of a first event from a firstclient of a storage environment; select a configuration file associatedwith the first client; identify the first event as a named event in theconfiguration file; and modify processing of the first event accordingto a first set of one or more delivery parameters associated with thenamed event in the configuration file.
 20. The apparatus of claim 19,the first indication comprising a request to transmit a notification tothe support system.
 21. The apparatus of claim 20, the logic to suppressthe request to transmit the notification based on the first set of oneor more delivery parameters.
 22. The apparatus of claim 20, the logic totransmit the notification based on the first set of one or more deliveryparameters.
 23. The apparatus of claim 19, the first indicationcomprising a request to log the event in the event configuration file.24. The apparatus of claim 23, the logic to deny the request to log theevent based on the first set of one or more delivery parameters.
 25. Theapparatus of claim 23, the logic to log the event based on the first setof one or more delivery parameters.
 26. The apparatus of claim 19, thelogic to: receive a second indication of a second event from the firstclient of the storage environment; select the configuration fileassociated with the first client; identify the second event as absentfrom the configuration file; and process the second event according to adefault set of one or more delivery parameters.
 27. The apparatus ofclaim 19, the logic to: receive a second indication of a second eventfrom a second client of a storage environment; determine the secondclient is absent from an association with a configuration file; andprocess the second event according to a default set of one or moredelivery parameters.
 28. A computer-implemented method, comprising:receiving a first indication of a first event from a first client of astorage environment; selecting an configuration file associated with thefirst client; identifying the first event as a named event in theconfiguration file; and modifying processing of the first eventaccording to a first set of one or more delivery parameters associatedwith the named event in the configuration file.
 29. Thecomputer-implemented method of claim 28, the first indication comprisinga request to transmit a notification to the support system.
 30. Thecomputer-implemented method of claim 29, comprising suppressing therequest to transmit the notification based on the first set of one ormore delivery parameters.
 31. The computer-implemented method of claim29, comprising transmitting the notification based on the first set ofone or more delivery parameters.
 32. The computer-implemented method ofclaim 28, the first indication comprising a request to log the event inthe event configuration file.
 33. The computer-implemented method ofclaim 32, comprising denying the request to log the event based on thefirst set of one or more delivery parameters.
 34. Thecomputer-implemented method of claim 32, comprising logging the eventbased on the first set of one or more delivery parameters.
 35. Thecomputer-implemented method of claim 28, comprising: receiving a secondindication of a second event from the first client of the storageenvironment; selecting the configuration file associated with the firstclient; identifying the second event as absent from the configurationfile; and processing the second event according to a default set of oneor more delivery parameters.
 36. The computer-implemented method ofclaim 28, comprising: receiving a second indication of a second eventfrom a second client of a storage environment; determining the secondclient is absent from an association with a configuration file; andprocessing the second event according to a default set of one or moredelivery parameters.
 37. One or more computer-readable media to storeinstructions that when executed by a processor circuit causes theprocessor circuit to: receive a first indication of a first event from afirst client of a storage environment; select an configuration fileassociated with the first client; identify the first event as a namedevent in the configuration file; and modify processing of the firstevent according to a first set of one or more delivery parametersassociated with the named event in the configuration file.
 38. The oneor more computer-readable media of claim 37, the first indicationcomprising a request to transmit a notification to the support system.39. The one or more computer-readable media of claim 38, withinstructions to suppress the request to transmit the notification basedon the first set of one or more delivery parameters.
 40. The one or morecomputer-readable media of claim 38, with instructions to transmit thenotification based on the first set of one or more delivery parameters.41. The one or more computer-readable media of claim 37, the firstindication comprising a request to log the event in the eventconfiguration file.
 42. The one or more computer-readable media of claim41, with instructions to deny the request to log the event based on thefirst set of one or more delivery parameters.
 43. The one or morecomputer-readable media of claim 41, with instructions to log the eventbased on the first set of one or more delivery parameters.